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ABSTRACT 

With the development of Massive Open Online Courses (MOOC) 
in recent years, discussion forums there have become one of the 
most important components for both students and instructors to 
widely exchange ideas. And actually MOOC forums play the role 
of social learning media for knowledge propagation. In order to 
further understand the emerging learning settings, we explore the 
social relationship there by modeling the forum as a heterogeneous 
network with theories of social network analysis. We discover a 
specific group of students, named representative students, who fea- 
ture large engagement in discussions and large aggregation of the 
majority of the whole forum participation, except the large learning 
behavior or the best performance. Based on these discoveries, to 
answer representative students’ threads preferentially could not on- 
ly save time for instructors to choose target posts from all, but also 
could propagate the knowledge as widespread as possible. Further- 
more if extra attention is paid to representative students in the sight 
of their behavior, performance and posts, instructors could readily 
get feedback of the teaching quality, realize the major concerns in 
forums, and then make measures to improve the teaching program. 
We also develop a real-time and effective visualization tool to help 
instructors achieve these. 

Keywords 

MOOC forum, Coursera, influence, behavior, performance, hetero- 
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1. INTRODUCTION 

Comparing with the traditional distance education or online cours- 
es, discussion forums in Massive Open Online Courses (MOOC) 
offer a big and lively venue for communication between students 
and instructors, which has been proved important for large-scale 
social learning [1, 7, 9]. However, due to their massiveness, the 
forums are full of various information relevant and irrelevant to the 
course [6]. So how to fast and accurately extract valuable informa- 
tion from the large-scale settings has become a problem to which 
priority should be given. 


Considering Twitter, Facebook or StackOverfiow, MOOC forum- 
s look similar to some kind of social media because of the large 
number of participants and their interactivity. Every member in the 
forum may talk about course content, such as asking or answering a 
question. The intensive interaction between them actually supports 
the knowledge propagation between members of the learning com- 
munity. However here comes up a dilemma. In light of knowledge 
propagation, the proportion of instructors’ responses is expected as 
large as possible in order to resolve students’ questions; But consid- 
ering the scale, instructors could not have enough time to read every 
thread. In order to cope with this situation, we propose a trade-off 
solution that extracts infiuential students from all and recommend- 
ed them to instructors. Then instructors could make decisions in a 
much smaller scale and their’ effort would be amplified based on 
principles of infiuence propagation [12, 16, 24]. 

Although the definition of infiuence is various from different per- 
spectives, we leave aside others except instructor for the time being 
in this paper. We conceive in each forum there could be a group of 
infiuential students who attract many others to interact with them, 
just like the verified accounts in Twitter. We call them ‘represen- 
tative students’ and they involuntarily undertake the responsibility 
for knowledge propagation. So instructors could amplify the infiu- 
ence of right answers by preferentially responding to questions of 
representative students. Thus, many more students who pay atten- 
tion to representative students’ answers would also benefit without 
actually having a response by the instructor. On the other hand, 
given that representative students’ threads may get a lot of atten- 
tion, instructors could address the main concerns in the learning 
community more promptly. Through the rank list of representative 
students’ infiuence, the chief instructor could also realize whether 
other instructors (or called TAs) are on duty, since TAs’ infiuence 
could be calculated meanwhile. As we show later in this paper, rep- 
resentative students’ performance is not the best within the learning 
community, but given their positive motivation and high volume of 
messages answering promptly their questions is beneficial for the 
whole learning community. 

Since posts irrelevant to the course are unavoidable in such a free 
forum, for example chatting, making friends or other things, it is 
not reasonable to directly regard superposter [9] as representative 
students or merely consider their social relationship. Experiments 
later in this paper approve the opinion and find post contents are 
useful. That being the case, since we regard the interaction in 
MOOC forums as the procedure of knowledge propagation in so- 
cial media, we could build a heterogeneous network [23] to model 
the forum with two kinds of entities by leveraging theories of net- 
worked entities ranking. Then we can get a rank list of students’ in- 
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fluence from that network with a specially designed algorithm. The 
higher a student ranks on the list, the more influential she would 
be. This model could fully utilize the social information and tex- 
tual messages to avoid outliers or exceptions (e.g. someone who 
always submits posts irrelevant to the course). 

To our knowledge, this is the first work to adopt a heterogeneous 
network to model social relationship in MOOC forums and extrac- 
t representative students. We also propose a novel algorithm for 
ranking students’ influence based on graphic theories. Experimen- 
tal results show the effectiveness and efficiency of the algorithm 
are both decent. Through the analysis of representative students’ 
log data, we find they engage highly and aggregate much participa- 
tion except the excellent grades, which suggests they are represen- 
tative for instructors to watch the class and are the first low hanging 
fruit for increasing the passing rate. Analysis of historical records 
of interaction between instructors and students indicates it is time- 
saving and meaningful for instructors to recommend threads of rep- 
resentative students. Based on those discoveries, we developed a 
web service of visualization tool as an assistant for instructors to 
achieve the conception of supervising their class effort- savingly. 

2. RELATED WORK 

In traditional off-line classes, the scale is relatively small and face- 
to-face Q&A is not a challenge. And in traditional online education 
or online video class, not only the scale is not large enough but the 
absence of instructors is very common. However, a widespread 
viewpoint is that it is quite important for MOOC to make students 
engage in a social learning environment to guarantee and improve 
the teaching quality [1, 6, 7, 18]. 

In view of researches in the field of Community Question Answer- 
ing (CQA), issues related to this paper are about expert finding and 
forum search [21]. Recently, several novel methods for finding 
experts in CQA have been provided [26, 29, 30]. Nevertheless, 
there would be rare experts in MOOC forum due to the specifici- 
ty that a MOOC forum is not open to all kinds of discussions and 
it just belongs to the corresponding course for students to acquire 
knowledge. Also the definition of representative students here is 
different from that of experts. On the other hand, the task of dis- 
covering representative learners and their posts seems like forum 
search [3, 19] which develops a mechanism analogous to a search 
engine. But here we concentrate on just the ranking result and not 
emphasise the accuracy of retrieval. Except those general forum- 
related work, recently some researches of MOOC forums have been 
published from various perspectives. Eor example, Yang et al. [25] 
tried thread recommendation for MOOC students with method of 
an adaptive feature-based matrix factorization framework. Wen 
et al. [22] analyzed the sentiment in MOOC forums via students’ 
words for monitoring their trending opinions. And Stump et al. [20] 
proposed a framework to classify forum posts. 

The classical PageRank [5] and HITS [14] have been applied on 
broad problems of networked entities ranking and been promoted 
to solve problems in heterogeneous network [11, 15, 27]. [17, 28] 
built a heterogeneous network with two types of nodes to discov- 
er the influential authors with scientific repository data, which is 
similar to our work. The point in common is to discover influential 
entities with iteration by building a graphic model. In this paper, 
we leverage that principle and build a new heterogeneous network 
to model MOOC forum and discover representative students. 

Besides, many MOOC log analysis also involve forums. Ander- 


Table 1: Pairs of course code and course title 


Course Code 

Course Title 

peopleandnetworks-00 1 

Networks and Crowds 

arthistory-001 

Art History 

dsalgo-001 

Data Structures and Algorithms A 

pkuic-001 

Introduction to Computing 

aoo-001 

The Advanced Object-Oriented 
Technology 

bdsalgo-001 

Data Structures and Algorithms B 

criminallaw-001 

Criminal Law 

pkupop-001 

Practice on Programming 

chemistry-001 

General Chemistry (Session 1) 

chemistry-002 

General Chemistry (Session 2) 

pkubioinfo-001 

Bioinformatics: Introduction 
and Methods (Session 1) 

pkubioinfo-002 

Bioinformatics: Introduction 
and Methods (Session 2) 


Table 2: Statistics per course 


Course 

# threads 

# posts 

# votes 

peopleandnetworks-00 1 

219 

1,206 

304 

arthistory-001 

273 

2,181 

1,541 

dsalgo-001 

283 

1,221 

266 

pkuic-001 

1,029 

5,942 

595 

aoo-001 

97 

515 

204 

bdsalgo-001 

319 

1,299 

132 

criminallaw-001 

118 

763 

648 

pkupop-001 

1,085 

6,443 

977 

chemistry-001 

no 

591 

65 

chemistry-002 

167 

715 

678 

pkubioinfo-001 

361 

2,139 

1,474 

pkubioinfo-002 

170 

942 

235 

Overall 

4,259 

24,042 

- 


son et al. [2] deployed a system of badges to produce incentives 
for activity and contribution in the forum based on behavior pat- 
terns. Huang et al. [9] specially analyzed the behavior of super- 
poster in 44 MOOC forums and found MOOC forums are mostly 
healthy. Kizilcec et al. [13] did a research on the behavior of stu- 
dents disengagement. Some technical reports and study case papers 
also involved behavior analysis of MOOC students in forums, such 
as [8] and [4]. Nevertheless, we believe incentives established on 
intelligent analysis of various data like social information and tex- 
tual messages would be more reasonable than on the pure credits 
mechanism in traditional forums, since the latter only considers the 
quantity of behavior while not the quality. 

3. DATASET 

We use all the log data of 12 courses from Coursera platform. They 
were offered in Fall Semester of 2013 and Spring Semester of 2014. 
There are totally over 4,000 threads and over 24,000 posts. For con- 
venience later in the paper. Table 1 lists the pairs of course code and 
course title. Table 2 shows the statistics of the dataset per course. 
Here posts denotes responses including posts and comments. We 
can see both the subjects and scales range widely. 

4. MODEL AND ALGORITHM 

In order to model MOOC forums as social media, the first chal- 
lenge is that no explicit post-reply relationship which describes 
who replies who is recorded. We simplify this problem and assume 
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Table 3: Attributes of the heterogeneous network constructed per course 


Course 

ns 

Gs 

\Es\ 

\Es\/nl 

riK 

Gk 

\Ek\ 

\Ek\IuI 

\Esk\ 

Gsk 

\Esk\/ { ns + uk)‘^ 

peopleandnetworks-00 1 

321 

3,287 

0.032 

1,193 

104,821 

0.074 

4,814 

0.002 

arthistory-001 

540 

17,022 

0.058 

3,376 

1,019,289 

0.089 

14,195 

0.001 

dsalgo-001 

295 

1,876 

0.022 

1,152 

124,118 

0.094 

5,009 

0.002 

pkuic-001 

768 

19,801 

0.034 

2,302 

302,989 

0.057 

14,599 

0.002 

aoo-001 

175 

1,963 

0.064 

783 

73,208 

0.119 

2,597 

0.003 

bdsalgo-001 

225 

2,369 

0.047 

781 

23,540 

0.039 

3,133 

0.003 

criminallaw-001 

219 

2,971 

0.062 

1,224 

123,737 

0.083 

4,577 

0.002 

pkupop-001 

628 

12,883 

0.033 

1,748 

88,035 

0.029 

13,807 

0.002 

chemistry-001 

130 

886 

0.052 

1,055 

111,026 

0.100 

2,685 

0.002 

chemistry-002 

125 

2,341 

0.150 

964 

61,425 

0.066 

2,574 

0.002 

pkubioinfo-001 

594 

22,275 

0.063 

686 

46,768 

0.099 

1,946 

0.001 

pkubioinfo-002 

189 

1746 

0.049 

380 

16662 

0.115 

784 

0.002 


Table 4: Notations 


Notation 

Description 

G={V,E,W) 

Gs = (Vs,Es,Ws) 
Gk = {Vk,Ek,Wk) 
Gsk = (Vsk,Esk,Wsk) 
ns, UK 

heterogenous network 
student subnetwork 
keyword subnetwork 
bipartite subnetwork 
\Vs\,\Vk\ 


if two students appear in the same thread, they have the same top- 
ic interests and the one whose post is chronologically later replies 
the other. As mentioned in previous sections, post contents of rep- 
resentative students should be course-related. Thus it may be not 
enough to cover that demand with only extracting the post-reply re- 
lationship. Based on the fact that the most post contents are course- 
related [9], we add the keywords as another kind of entities into the 
model to construct the heterogenous network. The keywords here 
are all meaningful nouns in post contents and they could represent 
various aspects of topics. Other kinds of parts of speech are un- 
explored at the present. The role of keywords in the heterogenous 
network is to help the algorithm reinforce the influence of students 
who involve more topics, which ensures the need that posts of pre- 
sentative students are course-related. Figure 1 shows the demo of 
the heterogeneous network, and Table 4 lists the deflned notations. 



Figure 1: Demo of the heterogeneous network G. Circles de- 
note Vs and rectangles denote Vk* Solid lines with arrows 
denote the co-presence relationship between students in the 
same thread and arrows denote one whose post is later points 
to the other.Dash lines with arrows denote the co-presence of 
keywords in the same thread but directed or bidirectional ar- 
rows mean the two keywords are in the different post or not. 
Dash lines without arrows denote the authorship between s- 
tudents and keywords. The weight values mean the times of 
co-presence of two entities on corresponding edges. Self co- 
presence is meaningless and all ignored. 

This model captures the characteristic that representative students 


would own more latent post-reply relationship and involve more 
topics. After building the network through log dataset, the basic 
attributes of graphs per course are calculated (Table 3). 

For co-ranking students and keywords, we need an algorithm. We 
simulates two random surfers jumping and walking in the hetero- 
geneous network and design the algorithm named Jump-Random- 
Walk (JRW). We assume the weights W represent the influence 
between entities and the algorithm’s task is to discover the most in- 
fluential students, namely representative students. Figure 2 shows 
the framework of JRW algorithm. 



Figure 2: The framework of Jump-Random- Walk algorithm. 
13 is the probability of walking along an edge within Gs or Gk» 
A is the probability for jumping from Gs to Gk or in reverse. 
A = 0 means to discover representative students only by using 
post-reply relationship. We assume the probabilities of each 
jump or walk are consistent. 


Denote s G and k G are the ranking result vectors, 

also probability distributions, whose entries are corresponding to 
entities of 1/^ and Vk, subject to ||s||i < 1 and ||k||i < 1. Denote 
the four transition matrixes, Gs, Gk, Gsk and Gks, for iteration 
as S' G K G SK G R^sKxnsK^ and KS G 

^riKxns respcctivcly. Adding the probability of random jumping 
for avoiding trapped in connected subgraph or set of no-out-degree 
entities, the iteration functions are 


s = (1 - A)(^Ss + (1 - I3)ens/ns) + ASKk, (1) 

k = (1 - A)(^A^k + (1 - /3)en^/riK) + XKSs, (2) 

where eng ^ and G R^^ are the vectors whose all 
entries are 1 . The mathematical forms of four transition matrixes 
are 

Sij = g wherey^ wfj ^ 0, (3) 


Ki,j = 


E, 




where E, w^j ^ 0, 


(4) 
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Figure 3: NDCG^% scores of different rankings 


SKi. = 


KSij = 


E,. 




where^^ w^f ^ 0 . 


( 5 ) 


( 6 ) 


Wi^j is the weight of the edge from Vf to Vj^ , wfj is the weight of 
the edge between Vi^ and , wf^f is the weight of the edge be- 
tween Vf and and wf^f is the weight of the edge between 
and Vf. Actually ’ When = 0, 

it means the student Vf is always the last one in a thread. If 
E. = 0’ it means the keyword V/^ always has no peer in 
a thread. Actually this situation almost never happens in our fil- 
tered data. wf^f' = 0 is also impossible, which means every 
keyword would have at least one author (student). On the contrary, 
it does not make sure that every student would post at least one key- 
word, because maybe there is some post having nothing valuable or 
not containing any nounal keyword. Algorithm 1 shows the detail 
of JRW algorithm below. 



Figure 4: Iteration speed of Jump-Random-Walk 


Chinese or traditional Chinese, we filter the non-Chinese contents 
in the preprocessing step with a tool of Chinese words segmenta- 
tion which is essential for extracting Chinese keywords. Also we 
filter the HTML tags irregularly existed. During this process, most 
spam and valueless posts are filtered incidentally. 


Algorithm 1 Jump-Random- Walk on G 

INPUT S', K, SK, KS, (3, A, e 
l:s ^ e/ns 
2:k ^ e/uK 

3: repeat 

4: s ^ s 

5: k ^ k 

6: s = (1 - A)(^Ss + (1 - I3)ens/ns) + XSKk 

7: k = (1 - \){/3Kk + (1 - /3)en^/riK) + XKSs 

8:until |s — s| < e 
9: return s, k 


5. EXPERIMENTS 

We do not exclude the data of instructors (or TAs) and regard every- 
one in the forums as ‘students’. So that instructors’ infiuence can 
also be evaluated in the uniform framework. Since the courses are 
all in Chinese and the contents are overwhelmingly most in simple 


To evaluate the effectiveness of JRW, we set some competitors list- 
ed below. 

• Post the most (PM), for superposter by quantity. The more 
amount and frequency of posts are submitted, the higher she 
would rank. 

• Be voted the most (VM), for superposter by quality. The 
larger ratio of the number of votes earned to the average num- 
ber of votes in a forum, the higher she would rank. 

• Reputation (RE), for superposter by reputation. It is a repu- 
tation score maintained by the Coursera platform and can be 
seen as a measure of both the quantity and quality of a forum 
student’s contribution. 

• PageRank (PR), for representative students only by post- 
reply relationship. It computes each forum student’s influ- 
ence only in Gs with PageRank algorithm. 
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Table 5: Representative students’ behavior and performance. P(R\T) is the proportion of the number of threads initiated by 
representative students to the all. P{R\P) is the proportion of the number of posts by representative students to the all. Over Rate is 
the deviation of the average numbers of posts per thread initiated by representative students and the all. P(R\V) is the proportion 
of the number of watching video by representative students to the all. P(R\Q) is the proportion of the number of submitting quiz by 
representative students to the all. P{R\C) and P{R\C, D) are the proportions of certificated representative students and certificated 
representative students with distinction to the all. Precise is the proportion of the number of posts by instructors in threads initiated 
by representative students to that of all the instructors’ posts. Recall is the proportion of the number of threads replied by instructors 
to that of threads initiated by representative students. 


Course 

Forum Behavior 

P(R\T) P(R\P) Over Rate 

Learning Behavior 
P{R\V) P{R\Q) 

Performance 
P{R\C) P{R\C,D) 

Instructor 
Precise Recall 

peopleandnetworks-00 1 

0.205 

0.246 

1.182 

0.084 

0.074 

0.126 

0.167 

0.267 

0.556 

arthistory-001 

0.289 

0.335 

1.125 

0.102 

0.074 

0.109 

0.188 

0.453 

0.190 

dsalgo-001 

0.177 

0.355 

5.961 

0.061 

0.082 

0.075 

0.038 

0.182 

0.540 

pkuic-001 

0.282 

0.444 

-0.649 

0.077 

0.088 

0.117 

0.151 

0.328 

0.545 

aoo-001 

0.247 

0.328 

1.446 

0.090 

0.056 

0.071 

0.042 

0.351 

0.583 

bdsalgo-001 

0.210 

0.473 

0.401 

0.110 

0.047 

0.047 

0.054 

0.286 

0.866 

criminallaw-001 

0.246 

0.326 

1.524 

0.060 

0.067 

- 

- 

0.504 

0.793 

pkupop-001 

0.283 

0.428 

1.122 

0.095 

0.091 

0.126 

0.212 

0.356 

0.596 

chemistry-001 

0.082 

0.367 

1.706 

0.050 

0.076 

0.078 

0.079 

0.207 

1.000 

chemistry-002 

0.413 

0.494 

0.707 

0.056 

0.042 

0.071 

0.036 

0.362 

0.696 

pkubioinfo-001 

0.260 

0.332 

-0.963 

0.097 

0.061 

0.075 

0.061 

0.284 

0.713 

pkubioinfo-002 

0.200 

0.445 

0.282 

0.029 

0.035 

0.028 

0.035 

0.210 

0.706 


• Jump-Random- Walk (JRW), for representative students. It 
co-ranks the influence of both forum students and keywords 
meanwhile in G. 


In order to compare with superposter, we set the same metric that 
a student is called a representative student when she is within top 
5% of the rank list. Note that other alternative metrics, such as the 
threshold of an absolute number, are also feasible. The parameters 
used in JRW are [3 = 0.85, A = 0.5 and e = 10“®. A = 0.2 and 
A = 0.8 are also tried, however the differences are tiny. We adopt 
Normalized Discounted Cumulated Gain (NDCG) [10] as the met- 
ric which is applicable for evaluating rankings’ quality. We invited 
two human judges who both are experienced in MOOC forums. 
They give the influence of each top 5% student a score by read- 
ing all the contents of related threads. Each thread and post here 
are preprocessed to be anonymous and unordered. Score values in- 
clude 0, 1, 2 and 3, which denotes strongly disagree, disagree, agree 
and strongly agree. Finally the two assessments are averaged. 

Figure 3 shows the results of human assessment. JRW outperforms 
others among the majority of courses as well as PR, which sug- 
gests the necessity of building such a heterogeneous network for 
discovering representative students. If instructors would set a rule 
to incentivize representative students, JRW could also be more ob- 
jective and fairer than simple rankings based on the quantity of 
behavior. Here is a phenomenon that students voted the most are 
not representative. This is maybe by reason that the majority of 
forum students are actually not used to voting the influential posts 
while unusual comments earn many. In addition, we carry out the 
convergence analysis of JRW algorithm. Figure 4 shows this algo- 
rithm can converge rapidly and satisfy the requirement of real-time 
computation in large-scale applications. 

6. ANALYSIS OF REPRESENTATIVE STU- 
DENTS 

In this section, we would explore the characteristics of represen- 
tative students in two aspects of behavior and performance. Then 


based on the model and algorithm proposed, we developed a web 
service which can help instructors supervise not only the behav- 
ior and performance of each student, but also their relative position 
compared with the average level of the whole class. This service 
could be competent for instructors to gain feedback of the teaching 
quality. 

6.1 Behavior and Performance 

Firstly, we analyze the difference of behaviors between represen- 
tative and non-representative students from a statistic view. Ta- 
ble 5 shows the proportions of various behavior of representative 
students to the whole forum students per course. The column of 
Forum Behavior contains three indicators, among which P(R\T) 
and P(R\P) reflect the degree of representative students’ partici- 
pation in forums. Over Rate indicates if the value is over zero, it 
means representative students’ threads are more popular than the 
average, and vice versa. The values of the three indicators sug- 
gest in most forums representative students’ participation is rela- 
tively high considering their low ratio, only 5%, and their threads 
are more popular. In other words, the result here manifests threads 
of representative students initiate the majority of discussions, not 
counting in the possible sub-discussions initiated by them within a 
thread. 

The column of Learning Behavior shows the behavior of watching 
video and submitting quiz by representative students. The values 
of the two indicators, P{R\V) and P{R\Q), suggest the degree of 
learning behavior of representative students is relatively low com- 
pared with their participation, but still larger than 5%. So we can in- 
fer that representative students’ learning behavior is just above the 
average. This also suggests their motivation is positive by judging 
from the value of P{R\Q) which is related to the flnal certiflcate. 

The column of Instructor demonstrates the necessity of preferen- 
tially answering the threads of representative students. Precise sug- 
gests instructors spent almost one third energy on answering rep- 
resentative students’ questions, while Recall suggests instructors 
have answered about two third, up to overall, threads initiated by 
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Figure 6: # of standard deviations of representative students outperforming non-representative students on grades per course, com- 
paring with superposters by quantity. 



PM VM RE JRW 


Figure 5: # of standard deviations of representative students 
outperforming non-representative students on grades averaged 
over all courses. 


representative students. The historical records explain it is neces- 
sary for instructors to discover the representative students and their 
posts, since the range and time cost of choosing which post to re- 
ply from all are both reduced. The indicator of Over Rate also 
implies preferentially answering the threads of representative stu- 
dents means more audience would be indirectly beneficial, without 
actually having a response by the instructor. 

Then we would analyze the performance of representative students 
in the forums. Still in Table 5, the column of Performance denotes 
the proportions of certificated representative students. P{R\C) and 
P{R\C^D) are indicators of the passed and the excellent repre- 
sentative students respectively. The values indicate representative 
students have the higher proportion among the excellent students 
than the passed students in most courses. However it is potential to 
improve the proportion of passing rate considering the large forum 
participation and positive motivation of representative students. So 
they are worthy being paid extra attention by instructors. 

Figure 5 shows the standard deviations, that are averaged z-score 
grades, to illustrate whether representative students’ averaged grade 
outperforms that of non-representative students among all courses, 
comparing four different ranking metrics. Superposter by quantity 
(PM), superposter by reputation (RE) and representative students 
by JRW (JRW) outperform their peers. However, the score of JRW 


is lower than that of PM. This may suggest representative students’ 
performance is better than the peers, but not the group with best 
scores, and the top 5% students who post the most have the higher 
average score. 

From the perspective of each course, representative students’ per- 
formances are various. Figure 6 exhibits the same standard devi- 
ations per course. We can see representative students do not out- 
perform their peers in some courses. Superposter and represen- 
tative students almost show the consistent trends except for Gen- 
eral Chemistry. Representative students’ grade is lower than that 
of superposters by quantity in most courses, which also suggests 
representative students may have better performance above the av- 
erage but not the best. This phenomenon could be explained that 
maybe similar to off-line class, representative students hard to mas- 
ter course content would involve more questions and need more 
instructions, while superposters by quantity are ones good at the 
course and always answer questions. So representative students are 
characterised by large participation of discussions, moderate learn- 
ing behavior, and above-average performance but not the best. 

6.2 Visualization Tool for Instructor 

With the various forms of data, an open-and-shut visualization tool 
could be helpful for instructor to evaluate representative students 
and supervise their behavior. In order to apply the model proposed 
in previous sections to an actual function, we scale the final ranking 
scores to 0-100 as an index score, and developed such a web service 
whose interface looks as Figure 7. 

Here we present the typical usage scenario of the service. Instruc- 
tors could choose which course to see (Figure 7 A). Surely we 
would add role and permission administration to protect privacy 
in the future while here is just the demo of use cases. Then instruc- 
tors could choose to see how many top students, at most overall 
(Figure 7 B). Instructors can also select to see the representative 
students’ behavior (Figure 7 C) or their post contents (Figure 7 D). 
In the main exhibition area (Figure 7 C) where is a table list, in- 
structors can realize the top students’ various behavior, including 
forum participation, learning behavior and performance, students’ 
infiuence index, and role in the forum. If instructors select to see 
‘influential post’, the main area would replaced by the post con- 
tents composed by representative students (Figure 7 D). We con- 
ceive that Figure 7 D should provide functions for instructors to re- 
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Figure 7: Web service interface 


spond, rate, provide feedback and/or other post-related operations 
like those in the normal forum discussion settings in the future. 
Given the menu tab ‘Influencer’ selected, if instructors click the ra- 
dio button ahead each record of the list, the behavior of correspond- 
ing student would also be presented in the radar chart (Figure 7 E). 
The radar chart displays six dimensions about students’ behavior, 
that are quiz submission, video watching, vote, response, initiated 
thread, and flnal score. The scale of each dimension ranges from 
the minimum to the maximum of each class. Actually there are 
two closed hexagons on the radar chart. The flxed one in the mid- 
dle denotes the average values in the whole class while the other, 
changed with trigger of radio click corresponding to each student, 
indicates the behavior of individual student. This radar chart can 
help instructors evaluate the behavior of each student comparing 
with the whole class under different dimensions. 

In our observation and interview, this web service offers instructors 
the way to realize the class macroscopically and get feedback of 
main concerns in the forum promptly. Note that due to the rapid 
speed of our algorithm, this web service can real-timely refresh 
with changes of students’ forum behavior. 

7. CONCLUSION AND FUTURE WORK 

In the MOOC forum settings, different participants may consider 
the influence as different deflnitions. We stand at the side of instruc- 
tors and assume the influencers in MOOC forums are representative 
students who stimulate and attract much forum participation. They 
are actually characterized by lively engagement in forum discus- 
sions but unexpected learning behavior and performance, compar- 
ing with superposter. They are worthy being paid extra attention 
from instructors thereby to improve the course passing rate. Since 
they aggregate much discussion, they could be helpful to amplify 
instructors’ answers and play the latent roles of knowledge propa- 
gation. Through representative students’ influence, instructors can 
time-savingly realize the hot topics concerned by the most students. 
TAs’ workload can be evaluated incidentally. In general, it is mean- 
ingful for instructors to preferentially read and answer representa- 
tive students’ threads. 

In this paper, we leverage methods and algorithms of social net- 
work analysis to model MOOC forums in order to further under- 
stand the MOOC social learning settings and provide bases for in- 


structors to intervene the social learning. This model has the advan- 
tages of fully utilizing social information and textual messages to 
identify and rank students’ influence. Thus based on their behavior, 
performance and post contents, instructors may make measures to 
improve the teaching quality, better with that web service of visu- 
alization tool as an assistant. 

Nevertheless, we have much future work to reflne the discoveries 
in this paper. We would attempt other kinds of heterogeneous net- 
works with more forum information and explore the effect of pa- 
rameters. Some other random walk algorithms, such as HITS and 
topic based ones, would be more effective. Furthermore, by inte- 
grating our visualization tool into a practical platform, whether the 
ampliflcation of knowledge propagation via representative students 
is effective and whether the teaching quality could be promoted 
still need to be verifled through subsequent courses speciflcally de- 
signed in the future. 
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